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(54) Phonem based speech synthesis 

(57) Statistical data including an average value, 
standard devation, and minimum value of a phoneme 
duration of each phoneme is stored m a memory. When 
speech production time is detenrened for a phonenrw 
string in a predetenmined expiratory paragraph, the total 
phoneme duration of the phoneme string is set so as to 
become equal to the speech production time. Based on 
the set phoneme duration. phonenDes are connected 
and a speech waveform is generated. To set a phoneme 
duration for each phoneme, a phoneme duration initial 
value s first set based on an average value, obtained t>y 
equally cfividir^ the speech production time by pho- 
nemes of the phoneme siring, and a phoneme duration 
range, set based on statistk:al data of each phoneme. 
Then, the phoneme duratk)n initial value is adjusted 
based on the statistical data and speech production 
tinrte. 
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Description 

BACKGROUND OF THE INVEMTION 

[0001] The present invention relates to a method and an apparatus for speech synthesis utilizing a rule-based syn- 
thesis method, and a storage medium storing computer-readat>le progranrs for realizing the speech synthesizing 
method. 

[0002] As a method of controlling a phoneme duration, a conventional rul&tesed speech synthesizing apparatus 
employs a control rule method determined based on statistics related to a phoneme duration (Vbshinori KOUSAKA, 
Vbuichi TDUKURA, "Phoneme Duration Control for Rule-Based Speech Synthesis," The Journal of the Institute of Hec- 
tronics and Communication Engineers of Japan, vol. J67-A, Na 7 (1984) pp 629 - 636), or a method of employing Cat- 
egorical Multiple Regression as a technique of multiple regression arialysis (Tetsuya SAKAYORI, Shoichi SASAKI, 
Hiroo KTTAGAWA. "Prosocfies Control Using Categorical Multiple Regression for Rule-Based Synthesis, "Report of the 
1 986 Autumn Meeting of the Acoustic Society of Japan, 3-4-1 7 (1 986-1 0)). 
[0003] However, according to the above conventional technique, it is difficutt to spec^ 

phoneme string. Irbr iristance, in the control r Je method, rt is cfifficutt to detemnine a control rule that corresponds to a 
specified speech production time. Morecver, if input data includes an exception in the control rule method, or if a satis- 
factory estimation value is not obtained in the nrtethod of Categorical Multiple Reg-ession, it becomes difficult to obtain 
a phoneme duration that sounds natural. 

[0004] In a case of controlling a phoneme duration by using control rules, it is necessary to weigh the statistics (aver- 
age value, standard deviation and so on) while taking into consideration of the combination of preceding and succeed- 
ing phonemes, or it is necessary to s^ an expansion coefficient. There are various factors to be manipulated, ag., a 
comt)ination c# phonemes depencfing on each case, parameters such as weighting and expansion coefficients and the 
like. Moreover, the operation method (control rules) must be determined by rule of thumb. Therefore, in a case where a 
speech prediction time of a phoneme string is specified, the nunri>er of combinations of pfionemes t>eoome extremely 
larga Furthermore, it is diff teult to determine control rules applicable to any combination of phonemes in which a total 
phoneme duratk)n \s ctose to the specified speech production time. 

SUMMART OF THE INVENTION 

[0005] The preseminventkm is rnade in consideration of the above situation, and has as its o^ 
synthesizing method and apparatus as well as a storage medium which enables setting a phoneme duratwn for a pho- 
neme string so as to achieve a specified speech production time, and which can provide a natural phoneme duration 
regardless of the length of speech productbn tima 

[0006] In order to attain the above object, the speech synthesizing apparatus according to an embodiment of the 
present invention has the fbOowing configuration. More specifically, the speech synthesizir^ apparatus for performing 
speech syrrthests according to an inputted phorierne string corrprises: storage means for storing statistkal data related 
to a phorieme duratk>n of each phonenrte; determining means for detemnning speech production time of a phoneme 
string in a predetermined section; setting means for setting a phoneme duratfon corresponding to tfie speech produc- 
tton time off each phoneme constructing the phoneme string, based on the statistk;al data of each phoneme obtained 
from said storage rneans; and generating nieans for gerierating a speech waveform t)y connecting phonennes using tfra 
phoneme duration. 

[0007] Furthermore, the present invention provkies a speech synthesizing method executed by the above speech 
synthesizing apparatus. Moreover, the present invention provkJes a storage nradium storing control programs for having 
a corrputer realize the above speech synthesizing method. 

[0008] Other features and advantages of the present inventfonwil be apparert from the fon^^ 

in conjunction with the accompanying drawings, in whrch like reference characters designate the same or similar parts 

throughout the figures thereof. 

BRIEF DESCRIPTION OF THE DRAWIISIGS 

[0009] The accompanying drawings, whk^h are incorporated in and constitute a part of the specifk:atk)n. DIustrate 
embodiments off the inventioa and tog^fier with the description, serve to explain the principles off the inventk)n. 

Fig. 1 is a block diagram showing a oonstructkxi off a speech synthesizing apparatus according to an en 
of the present invention; 

Rg. 2 is a block diagram showing a ftow structure off the speech synthesizing apparatus according to the errtxxli- 
ment off the present inventfon; 
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Fig. 3 is a flowchart shewing speech synthesis steps according to the embodiment of the present invention; 
Fig. 4 is a table shewing a configuration of phoneme data according to a first embodiment of the present invention; 
Fig. 5 is a flowchart showing a determining process of a phoneme duration according to the first embodiment of the 
present invention; 

Fig. 6 is a view showing an exarrple of an irputled phoneme string; 

Fig. 7 is a table showing a data configuration of a coefficient table storing coefficients aj ^ for Categorical Multple 
Regression according to a second embodiment of the present invention; 

Fig. 8 is a table showing a data configuration of phoneme data according to the second errtxxiiment of the present 
invention; and 

Figs. 9A arKi 9B are flowcharts showing a determining process of a phoneme duration accordng to the second 
embodiment of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[001 0] Preferred embocfiments of the present invention will be described in detail in accordance with the acconpany- 
ing drawings^ 

[Rrst Embodiment] 

[0011] Ftg. 1 is a block (fiagram showir^g a construction of a speech synthesizing apparatus according to a first 
embodintent of the present invention. Reference numeral 101 denotes a CPU which perionms various controls in the 
rule-based speech synthesizing apparatus of the present embodiment Reference numeral 1 02 denotes a ROM where 
various parameters and control programs executed by the CPU 101 are stored. Reference numeral 103 denotes a RAM 
virtiich stores control programs executed by the CPU 101 and serves as a work area of the CPU 1 01 . Reference numeral 
104 denotes an external memory such as hard disk, floppy cfisK CD-ROM and the like. Reference numeral 105 denotes 
an input unit comprising a keyboard, a mouse and so forth. Reference numeral 106 denotes a display for performing 
various display according to the control of the CPU 101. Reference numeral 6 denotes a speech synthesizer for gener- 
ating synthesized speech. Reference numeral 1 07 denotes a speaker where speech signals (electric signals) outputted 
by the speech synthesizer 6 are converted to sound and outputted. 
[0012] Rg. 2 is a bk)dc diagram showing a fkiw structure of the speech synthesizing appara 
embodiment Functions to be described below are realized by the CPU 101 executing control programs stored in the 
ROM 102 or executing control prograrnstoaded from the external memory 104 to the RAM 103. 
[001 3] Reference numeral 1 denotes a character string input mit for inputting a character string of speech to be syn- 
thesized, i.e., phonetic text, which is inputted by the input unit 105. For instance, if the speech to be synthesized is 
"O • N • 8 • E • r. the character siring input unit 1 inputs a character string "a n, s. e. i". This character string sometimes 
contains a contrd sequence for selling the speech productk)n speed or the pitch 

a control data storage unit for storing, in internal registers, informatkm whk:h is found to be a control sequence by the 
character siring input unit 1 , and control data such as tfw speech produ;tion spe^ 

from a user interfaca Reference numeral 3 denotes a phorYeme string generation unit whfch converts a character siring 
inputted by the character string input unit 1 irito a phonen)e string. For inslarK». the character string 'of n. s, e, is con- 
verted to a phoneme string 'a X, s, e, i". Reference numeral 4 denotes a phoneme string storage unit for storing the 
phoneme string generated t3y the phoneme string generalk)n iHtit 3 in the imernal registers. Note that the RAM 103 rnay 
serve as the afDrementk)rted internal registers. 

[001 4] Reference numeral 5 denotes a phonenne duration setting unit whKh sets a phoneme duration in accordance 
with the control data, representing speech production speed stored in the control data storage unit 2. and the type of 
phoneme stored In the phoneme string storage unit 4. Reference numeral 6 denotes a speech synthesizer which gen- 
erates synthesized speech from the phoneme string in which phoneme duration is set by the phoneme duratkm setting 
unit 5 and the control data, representing pitch of voice, stored In the control data storage unit 2. 
[001 5] Next, description will be provided on setting a phoneme duration which is executed by the phoneme duration 
setting unit 5. In the folkywing description, Ci indk^tes a set of phonemes. As an example of mhe following may be 
used: 

O = {a e. i, o, u. X (syllabic nasal), b^ d. g. m, n, r, w, y, z. ch, f, h. K p. s. sh, t Is. 0 (double consonant)} 

[0016] Herein, it is assumed that a phonme duration setting section is an expiratory paragraph (sedkm between 
pauses) . The phoneme duratwn di for each phoneme ai of the phoneme siring is determined such that the phoneme 
string constructed by phonemes ai (1 ^ i ^ N) in the phoneme duralkHi setting sectwn is phonaled within the speech 
production time T, determined based on the control data representing speech produclk)n speed stored in the control 
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data storage unit 2. In other words, the phoneine duration di (equation (lb)) tor each oi (equation (la)) of the phoneme 
string Is determined so as to satisfy the equation (1c). 

aiEO(l^i^N) (1a) 

5 

di (1 i ^ N) (lb) 

N 

T=5;di (ic) 

10 l-l 



IS 



20 



[0017] Herein, the phoneme c&iration initial value of the phoneme oi is defined as doiO. The phoneme duration initial 
value doiO Is obtained by, for instance, dividing the speech production time T by the number N of the phoneme string. 
With respect to the phoneme oi, an average value, standard deviation, and the minimum value of the phoneme duration 
are respectively d^ined as ^oi, oal, daimin. Using these values, the initial value dai is determined by the equatton (2). 
and the obtained value is set as a new phoneme duration initial value. More specifically, the average value, standard 
deviation value, and minimum value of the phoneme duration are obtained for each type of the phoneme (for each oi), 
stored in a menrvMy, and the initial value of the phoneme duration is determined again using these values. 



25 



max(ji^ - 3a„i, d^^„) where (d^o < niax(fi^ - 3o^, d^^J ) 
d^io where (max(|i„, - 3a^,d^^J ^ d„,o ^ V^ai + 3a^) 

where vm--. ^ 



. . . (2) 



30 

[001 8] Using the phoneme duration initial value dai obtained in tNs manner, the phoneme duration di is determined 
according to the foDcwing equation (3a). Mote that if the obtained phoneme duration di satisfies di < Ooi where Ooi (>0) 
is a threshold value, di Is set according to equation (3b). The reason tfiat di is set to Ooi Is tfiat reproduced speech 
becomes unnatural If dl Is too sfiort. 

35 



d|-dai + P(<^J^ (3a) 



where 



40 N 
45 »-1 



dl = ei (3b) 

50 

[001 9] More specifically, the sum of the updated initial values of the phoneme duration is subtracted from the speech 
production time T, and the resultant value is cfivided by a sum of square of the standard deviation ooi of the phonenne 
duration. The resultant value is set as a coefficient p. The product of the coefficient p and a square of the standard devi- 
ation otxi. Is added to tfie initial value do! of the phoneme duration, and as a result the pfioneme duration di is obtained. 
55 [0020] The foregoing operation is descrft>ed with reference to the flowchart in Rg. 3. 

[0021] First in step SI. a phonetic text Is inputted by the character string Irput unit 1. In step S2. control data (speech 
production speed, pitch of voice) Inputted externally and the control data in the phonetic text Inputted in step Si are 
stored in the control data storage unit 2. In step S3, a phoneme string is generated t>y the phoneme string generation 
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unit 3 based on the phonetic text inputted by the character string input unit 1 . 

[0022] Next in step S4. a phonenie string of the next phoneme duration setting section is stored in the phonenne string 
storage unit 4. tn step S5. the phoneme duration setting unit 5 sets the phoneme duration initial value doi in accordance 
with the type of phoneme d (equation (2)). In step 86. speech production time T of the phoneme duration setting sec- 
tion is set based on the control data representing speech production speed, stored in the control data storage unit 2. 
Then, a phoneme duration is set for each phoneme string of the phoneme duration setting section using the above 
desaS>ed equations (3a) and (3b) such that the total phoneme duration of the phoneme string in the phoneme duration 
setting section equals to the speech production time T of the phoneme duration setting section. 
[0023] In step S7, a synthesized speech is generated based on the phoneme string where the phoneme duration ^ 
set by the phoneme duration setting uiit 5 and the control data representing pitch of voice stored in the control data 
storage unit 2. In step S8. it is deternitned whether or not the inputted character string is the last phoneme duration set- 
ting section, and if it is not the last phoneme duration setting section, the externally inputted control data is stored in the 
control data storage unit 2 in step S1 0. then the process returns to step S4 to continue processing. 
[0024] Meanwhile, if it is determined in step S8 that the inputted character string is the last phonen^ duration setting 
section, the process proceeds to step S9 for determining whether or not all input has been completed. If input is not 
completed, the process returns to step SI to repeat the above processing. 

[0025] The process of determining the duration for each phoneme, perlonned in steps S5 and S6, is desabed further 
in detail. 

[0026] Rg. 4 is a table showing a configuration of phoneme data aocorcfing to the first embodiment As shown in Rg. 
4, phonenoe data includes the average ^ue ^ of the phoneme duration, standard deviation a, mininrtum value dmin, 
and threshold value e with respect to each phoneme (a. e, i, o. u...) of the set of phonenDes Q 
[0027] Rg. 5 is a flowchart showing the process of detenmining a phoneme duration according to the first emtxxii- 
ment, wNch shows the detailed process of steps 35 arxJ S6 in Rg. 3. 

[0028] Rrst in step SI 01 . the number of components I in the phoneme string (obtained in step S4 in Rg. 3) and each 
of the components a1 to al, obtained with respect to the expiratory paragraph sutject to processing, are determined. 
For instance, if the phoneme siring comprises "a X. s. e. i". al to a5 are determined as shown in Rg. 6. and the number 
of components I is 5. In step S102. the variable i is initialized to 1. and the process proceeds to step S103. 
[0029] In step S103. the average value standard deviation a. and minimum value dmin ftx the phoneme ai are 
obtained based on the phoneme data stwwn in Rg. 4. By i^ng the obtained data, the phoneme duration initial value 
doi is deterrraned from the above equation (2). The calculation of the phoneme duration initial value doi in step 31 03 is 
performed for all the phoneme strings subject to processing. More specificany, the variable i is incremented in step 
3104, and step 3103 is repeated as long as the variable i is smaller than I in step SI 05. 

[0030] The foregoing steps 3101 to 3105 correspond to step 35 in Rg. 3. In the abov&described manner, the pho- 
neme duration initial value is obtained for all the phoneme strings with respect to the expiratory paragraph sii}ject to 
processing, and the process proceeds to step 31 06. 

[0031] tn step 3106, the variable i is inrtiafizedto 1. tn step 3107, the phoneme duration di for the phoneme al is deter- 
mined so as to coincide wHh the speech production tinrie T of the ex^Mratory paragraph, bas^ 
initial value for all the phonemes in the expiratory paragraph obtained in the previous process and the standard devia- 
tion of the phoneme ai (i a. determined according to the equation (3a)). Iff the phoneme duration di obtained in step 
31 07 is smaller than a threshold value Oai set for the phoneme oi. the threshold value Oai is set to di (steps 31 08 and 
3109). 

[0032] The calculation of the phonente duration cfi in steps 3107 to 3109 is peribrmed for all the phoneme strings 
sut)ject to processing. More specffically. the variable i is inaemented in step 31 10. and steps 3107 to 3109 are 
repeated as long as the variable i is snnaUer than I in step 31 1 1. 

[00331 Tbe foregoing steps 31 06 to 31 1 1 correspond to step 36 in Rg. 3. In the abovectescribed manner, the pho- 
neme ckiration of all the phoneme strings for attaining the production tinrie T is obtained with respect to the expiratory 
paragraph subject to processing. 

[0034] Equation (2) serves to prevent the phoneme duration initial value from being set to an unrealistic value or a low 
occun^ence probability value. Assuming that a probability density of the phoneme duration has a normal cOstrtoution, ttie 
probability of the initial value falling within the range from the average value to a value ± three times of the standard devi- 
ation is 0.996. Furthermore, in order not to set the phonenrie duration to a too small a value, the value is set no less than 
the minimum value of a sanrple group of natural speech production. 

[0035] Equation (3a) is obtained as a result of executing maximum likelihood estimation under the condition of equa- 
tion (1c). assuming tfiat the normal distribution having the phoneme duration Initial value set in equation (2) as an aver- 
age value is the probability density function for each phoneme duration. The maximum likelihood estimation is 
desabed hereinafter. 

[0036] Assume that the standard deviation of a phoneme duration of the phoneme ai is ooi. Also assume that the 
probability density distribution of the phoneme duration has a nornral distribution (equation (4a)). In this conditioa the 
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logarithiTBC likelihocxi off the phoneme duration is expressed as equation (4b). Herein, achienng the largest logarithmic 
likefihood Is equivalent to obtaining the smallest value K in equation (4c). Tlie phoneme duration di satisfying the above 
equation (1c) is determined so that the logarithmic likelihood off the phoneme duratbn is the largest. 



Pa.(d.) = {J2ioj ' expf . ^l^^l {4a) 



tog(Ud,)) = logfnP„,(d,)l (4b) 



tl (<»a|) 



U1 



2 



where 



Pai(di): probability density ffunctkm of the duration off the phoneme oi. 
L(d}: lik^ihood of the phoneme duration 

[0037] Herein, if variable conversion is performed as shown in equation (5a), ec^jations (4c) and (1c) are expressed 
by equations (5b) and (5c) respectively. When a sphere (equation (5b)) comes in contact with a plane (equation (5c)). 
i.e.. the case of equation (5d). the value K has the smallest value. As a resuft. equation (3a) is obtained. 

Pi = ^' (Sa) 



K=Zpi' (5b) 



U1 

N N 



ZPi^al=T-2;«*al (5C) 
40 i=1 i=>1 



where 



50 Z(^ai) 



Pi = P«^ai (5d) 



2 



b1 



[0038] Taking equations (2), (3a) and (3b) into consideration, with the use off the statistics (average value, standard 
deviation, minimum value) obtained ffrom a sarrple groip of natural speech production, the phonenw duration is set to 
55 the most probable value (highest maximum likelihood) which satisfies adesired speech productkxi time (equation (1c)). 
Accordingly, it is possftsle to obtain a natural phoneme duration, i.e., an error occurring in the phoneme duration is small 
when speech is produced to satisfy desired speech production time (equation (1c)). 
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[Second Embodiment] 

[0039] In the first embodiment, the phonenne duration dl of each phoneme ai is detemfiined according to a rule without 
considering the speech production speed or the category of the phonema In the second embocfiment. the mie for 

5 determining a phoneme duration cfi is varied in accordance with the speech production speed or the category of the 
phoneme to realize more natural speech synthesis, f^e that the hardware construction and the functional configura- 
tion of the second embocfiment are the same as that of ttie first embocfiment (Figs. 1 and 2). 
[0040] A phoneme oi is categorized according to the speech production speed, and the average value, standard devi- 
ation, and minimum value are obtained. For instance, categories of speech production speed are expressed as follows 

10 using an average arora duration in an expiratory paragraph: 

1 : less than 1 20 milliseconds 

2: ec|ual to or greater than 120 milliseconds and less than 140 milliseconds 
3: equal to or greater than 140 milliseconds and less than 160 milliseconds 
IS 4: equal to or greater than 160 milliseconds and less than 180 milliseconds 
5: equal to or greater than 180 milliseconds 

[0041] Note that ttie numeral value assigned to each category is a category index corresponcfing to each speech pro- 
duction speed. Herein, if the category index corresponding to a speech production speed is defined as a ttie average 
20 value, starxiard deviation, and the minimum value of the phoneme duration are respectively expressed as (icu(n), 
oou(n), daimin(n). 

[0042] The phoneme ciuration initial value of the phoneme oi is defined as ckuO. In a set of phonemes Oa, the pho- 
neme duration initial value daiO Is determined by an average value. In a set of phonemes Or, the phoneme duraticxi ini- 
tial value doiO is determined by one of the miAiple regression analysis. Categorical Multiple Regression (technique for 
25 explaining or predicting a quantitative external reference based on qualitative data). Phonemes Q do not contain ele- 
ments not included in either one of na or Or, or elements included in both Oa and nr. In other words, the set of pho- 
nemes satisfies the following equations (6a) and (6b). 

n„ n, = n (6a) 

30 

Q„on, = t (6b) 

[0043] When oi € Oa, i.e., ou t)elongs to 12a, the phoneme duration initial value is determined by an average value. 
More specifically, the category index n corresponcfing to speech production speed is obtained and the phoneme dura- 
35 tion initial value 6 cteternriined by the fc>llowir>gec)uation (7): 

^aoO = ^^ai(") (7) 

[0044] Meanwhile, when cu € Or, i.& , oi belongs to nr. the phoneme duration initial value is cietemrined by Categorical 
40 Multiple Regression. Hereia assuming ttwt index of factors is j (1 ^ j ^ J) and ttie category index corresponding to each 
factor is k (1 ^ k ^ KQ), ttie coefficient for Categorical Multiple Regression corresponding to Q, k) is aj 
[0045] For instance, the fonowing factors may be used. 

1 : ttie phoneme, two phonemes preceding the subject phoneme 
45 2: ttie phoneme, one phonenne prececfing ttie subject phoneme 
3: subject phoneme 

4: ttie phoneme, one phoneme succeecfing ttie subject phoneme 
5: ttie phoneme, two phonemes succeeding ttie subject phoneme 
6: an average nrxxa duration in an expiratory paragraph 
50 7: mora position in an expiratory paragraph 

8: part of speech of ttie word including a subject phoneme 

[0046] The numeral assigned to each of ttie above factors indicates an index of a factor j. 
[0O47] Examples of categories corresponding to each factor are provided hereinafter. Categories of phonemes are: 
55 1 : a, 2: e, 3: i, 4: o, 5: u. 6: X, 7: b. 8: d, 9: g, 10: m, 1 1 : n, 12: r, 13: w, 14: y. 15: z, 16: +, 1 7: c, 18: f. 19: h, 20: K 21 : p, 
22: s. 23: sh, 24: % 25: ts, 26: Q, 27: pausa When ttie factor is "subject phoneme", "pause" is removed. Atthough ttie 
expiratory paragraph is defined as a phoneme diration setting section in the present embodiment, since ttie expiratory 
paragraph does not include a pause, "pause" is removed from ttie subject phoneme. Note ttiat ttie term "expiratory par- 
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agraph" delines a section between pauses (the start and end of the sentence), which does not include a pause in the 
middia 

[0048] Categories of an average mora duration in an expiratory paragraph include the followings: 
1 : less than 120 nnillisecorKte 

2: ec^ to or greater than 120 milliseconds and less than 140 milliseconds 
3: equal to or greater than 140 milliseconds and less than 160 milliseconds 
4: equal to or greater than 160 milliseconds and less than 180 milliseconds 
5: equal to or greater than 180 milliseconds 

[0049] Categories of a mora position include the followings: 

1: first mora 
2: second nxxa 

3: third mora from the beginning and the tftird mora from the end 
4: the second mora from the end 
5:end nwra 

[0050] Categories of a part of speech (according to Japanese grammar) include the followings: 

1 : noun, 2: adverbial noun, 3: pronoun, 4: proper noun, 5: number, 6: verb, 7: adjective, 8: adjectival vertx 9: adverb, 
10: attributive, 11: conjunction, 12: interjection, 13: auxiliary verb, 14: case particle, 15: subordinate particle, 16: 
collateral particle, 17: auxiliary particle. 18: conjunctive particle, 19: closing particle. 20: prefix, 21 : suffix, 22: acflec- 
tival veit)al suffix, 23: sa-in-egular corrugation suffix, 24: adjectival suffix, 25: verbal suffix. 26: counter 

[0(»1 1 rJote that factors (also called items) indicate the type of qualitative data used in prediction of Categorical Mul- 
tiple Regression. The categories indicate possible selections tor each factor. The followings are provided based on the 
above examples. 

indexoffactorj = 1 : the phoneme, two phonemes precedng the subject phoneme 
category corresponding to index b=1 :a 
category corresponding to index k=2 : e 
category corresponding to index : i 
category corresponding to index : o 



category corresponding to index lc=26 : Q 

category corresponding to index : pause 

index of factor j = 2 : the phoneme, one phoneme preceding the subject phoneme 

category corresponding to index k?1 : a 
category conesponding to index k:s2 : e 
category corresporxling to index k=3 : i 
category oorresporKling to index : o 

category corresponding to index k^ : Q 
category corresponding to index k^7 : pause 

indexoffactorj = 3 :thesut)ject phoneme 

category con-esponding to index 1^=1 : a 
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category corresponding to index k=2 : e 
category corresponding to Index K=3 : 1 
category corresponding to index k=4 : o 

category corresponding to index k=26 : Q 

index of factor j = 4 : the phoneme, one phoneme succeeding the subject phoneme 

category correspondir>g to index k=1 : a 
category corresponding to irvlex k=2 : e 
category corresponding to index k=3 : i 
category corresponding to index 1^ : o 

category corresponding to index k=26 : Q 
category corresponding to index k=27 : pause 

index of factor j = 5 : the phoneme, two phonemes succeeding the subject phoneme 

category corresponding to irxJex : a 
category corresponding to index k=2 : e 
category corresponding to Index k;^ : i 
category corresponding to index k=4 : o 

category conresponding to index k=26 : Q 
category con-esponding to index k=27 : pause 

index of factor j = 6 : an average mora duration in an expiratory paragraph 

category conresponding to index k^l : less than 120 milliseconds 

category corresponcfing to index l&=2 : equal to or greater than 1 20 milliseconds and less than 140 milfiseconds 
category corresponding to index l&=3 : equal to or greater than 1 40 milliseconds and less than 1 60 n^ltiseconds 
category corresponding to index : equal to or greater than 1 60 miniseconds and less than 1 80 milfiseconds 
category corresponding to index : equal to or greater than 1 80 milliseconds 

irxJexof factor j = 7 : wont position in an expiratory paragraph 

category con^esponcfing to index k?1 :firstmora 
category corresponcfing to index lu^ : second mora 

category corresponding to index k=5 : end mora 

index of factor j=: 8: part of speech of the word including a subject phoneme 

category corresponding to index lc=1 :noun 
category con^esponcfing to index : adverbial noun 

category conresponding to index ks26 : counter 

[0052] It is so set that the average value of the coefficient aj^ for each factor is 0. i.e., equation (8) is satisfied, ^krte 
that the coefficient aj^ is stored in the external memory 104 as will be descra>ed later in Rg. 7. 

2a^ = 0(i^]^J) (8) 

k»1 



[0053] Furthernme, a dummy variable of the phoneme oi is set as folkwvs. 
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5i(> k) = 



f phoneme has value for category^ 

\k of factor j j 



o(case other than above 



) 



(9) 



10 [0054] Acons^tobeaddedtothesumof products of the coeffidentan^ 

value of a phoneme duration of the phoneme cu acconfing to Categorical Multiple Regression is expressed as equation 
(10). 



IS 



J KCD 

^ai = ZZMiO'k)+CO 
j»1 k.1 



(10) 



20 



[0055] Using the estimated value, the phoneme duration initial value of the phoneme d is determined by equation 1 1 . 



(11) 



[0056] Furthermore, the category index n corresponcSng to speech production speed is obtained, then the average 
value, standard deviation, and minlnoum value of the phoneme duration in the categay are obtained. With these values, 
25 the phoneme duration initial value doiO is updated by the following equation (12). The obtained initial value doiO is set 
as a new phoneme duration initial value. 



30 



35 



max (fi^(n) - r„a^(n) , d^^ ^„(n) ) if(d^o < max fti^i(n) - r„a^(n) , .i„(n) ) ) 



Rai(n) + J^<yai(n) 



if (Mai(n) + r„<yai('*> < <^aio) 

(12) 



[0057] A coefficient r„ which is multiplied by the standard delation in equation (1 2) is set as, ag., r„ = 3. With the 
phoneme (Oration initial value obtained in the foregoing manner, the phoneme duration is determined by the method 
40 Similar to that described in the first embodiment. More specifically, the phoneme duration di is determ'ned using the fol- 
lowing equation (13a). The phoneme duration di is determned equation (13b) if a threshold value Oai (>0) satisfies 
(fi<ed. 



45 



<*i=«^ai + p(<ycxi(")) 



(13a) 



where 



50 



P = 



(T-SdJ 

N 
b1 



55 



d, = e, 



(13b} 
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[0058] The aboveKJesCTtoed operation will be descrtoed with reference to the flowchart in Fig. 3. In step SI , a phonetic 
text is inputted by the character string ir^ unit 1. In step S2. control data (speech production speed, pitch of voice) 
inputted eternally and the control data in the phon^ text inputted in step Si are stored in the control data stor^e unit 
2. In step S3, a phoneme string is generated by the phoneme string generation unit 3 based on the phonetic text irput- 
ted by the character strirrg input unit 1. In step S4, a (^loneme string of the next duration setting section is stored in the 
phoneme string storage unit 4. 

[0059] In step S5, the phoneme duration setting unit 5 sets the phoneme duration initial value in accordance with the 
type of phoneme (category) by using the atx]ve-desaft}ed method, t>ased on the control data represerrting speech pro- 
duction speed stored in the contrd data storage uriit 2. the average value, starxJard devi^ 
phoneme duration, arxl the phoneme duration estimation value estimated by Categorical Multiple Regression. 
[0060] In step 86, the phoneme duration setting unit 5 sets speech production time of the phoneme duration setting 
section t>ased on the control data representing speech production speed, stored in the control data storage unit 2. 
Then, the phoneme duration is set for each phoneme siring of the phoneme duration setting section using the above 
described metfxxi such that the total phoneme duration of the phorteme string in the phoneme duration setting section 
equals to the speech production time of the phoneme duration setting section. 

[0061] In step 87, a synthesized speech is generated based on the phoneme string where the phoneme duration is 
set by the phoneme duration setting unit 5 and the control data representing pitch of voice stored in the control data 
storage unit 2. tn step 88, it is determined whether ornotthe inputted character string is the last phoneme duration set- 
ting section, and if it is not the last phoneme duration setting section, the process proceeds to step 810. In step 81 0. 
the control data externally inputted is stored in the control data storage unit 2, then the process returns to step 84 to 
continue processing. Meanwhile, if it is deternr«ned in step 88 that the inputted character string is the last phoneme 
duration setting section, the process proceeds to step S9 for determining whether or not all input has been completed. 
If input is net completed, the process returns to step 81 to repeat the above processing. 

[0062] The process of detenmining the duration for each phonenr>e, performed in steps 85 and 86 according to the 
second embodiment, is desabed further in detail. 

[0063] Rg. 7 is a table shewing a data configuration of a co^ident table storing the coefficient a,^ for Categorical 
Multiple Regression according to a second embodiment As described above, the factor j of the present errtxxfiment 
includes factors 1 to 8. Rsr each factor, a co^ident ajj^ corresponding to the category is registered. 
[0064] For instance, there are twenty-seven categories (ptK)neme categories) for the factor j=1, and twenty-seven 
coefficients a^ .i to a^ ^ 27 stored. 

[0065] F^. 8 is a table showing a data oorrfiguration of phonerne data according to the second e^^ 
in Rg. 8, phorierne data includes a flag Micative of whettier a phonenrte belongs to a dummy variable 6(j,k) 

indicative of whether or not a phoneme has a value for category k of the factor j, an average value \i, a starxiard devia- 
tion (T, a mirdmun value dmin. and a threshold value 0 of the phoneme duration for each category of speech production 
time with respect to each phoneme (a, e, i, a u.. . .) of the set of phonemes Q 

[0066] With the data shown in Rgs. 7 and 8, steps 85 and S6 in Rg. 3 are executed. Hereinafter, this process will be 
described in detail with r^ence to the flowchart in Rgs. 9A and 9B. 

[0067] lnstepS201 in Rg.9A, the number of a)mponents I in the phonerne string and each of the components a1 to 
ol, obtained with respect to the expiratory paragraph sut>ject to processing (obtained in step 84 in Rg. 3). are deter- 
mined. For instance, iff the phoneme string comprises "0^ X. s. e, r, a1 to a5 are detennined as shown in Rg. 6, and the 
nurrber off components I is 5. In step 8202, a category n corresponding to speech production speed is detennined. In 
the present embodiment, the speech production time T off the expiratory paragraph is determined based on a speech 
production speed represented by control data. The tinrie T is divided by the nurriber off co^ 
string in the expiratory paragraph to obtain an average rnora duration, and the category n is detennined. In step S203, 
the variable i is initialized to 1 , and the phoneme duration initial value is obtained by the following steps 8204 to 8209. 
[0068] In step 8204, phoneme data shown in Rg. 8 is referred in order to detenmine whether or not the phoneme oi 
belongstonr. If the phonenried belongs to nr, the process proceeds to step 8205 where the coeff idem aj^ 
from the coefficient table shown in Rg. 7 and the dummy variable {SiQM)) of the phoneme oi is obtained from the pho- 
neme data shown in Rg. 8. Then daiO is calculated using the aforementioned equations (10) and (1 1). Meanwhile if the 
phoneme oi belongs to Oa in step 8204, the process proceeds to step S206 where an average value |i of the phoneme 
(xi in the category n is obtained from the phoneme table, and daiO is obtained by equation (7). 
[0069] Then, the process proceeds to step 8207 where the phoneme duration initial value doi of the phoneme oi is 
determined by equation (12), utOizing n, a, dmin of the phoneme oi in the category n which are obtained from the pho- 
neme table, and dotO obtained in step 8205 a 8206. 

[0070] The calculation off the phoneme duration initial value daiO in steps 8204 to 8207 is performed for all the pho- 
neme strings sut)ject to processing. Mae specifically, the variable i is incremented in step S208, and steps 8204 to 
8207 are repeated as long as the variable i is smaller than I in step 8209. 

[0071] The foregoing steps 8201 to 8209 correspond to step 85 in Rg. 3. In the abcvedescrtoed manner, the pho- 
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neme chiration initial value is obtained for all the phoneme strings in the expiratory paragraph subject to processing, and 
the process proceeds to step S211 . 

[0072] lnstepS21l,thevariableiisinilializedto1JnstepS212,thephonemedurati^ 

mined so as to coincide with the speech production time T of the expiratory paragraph, based on the phoneme duration 
initial value for all the phonemes in the expiratory paragraph obtained in the previous process and the standard devia- 
tion of the phonenrie oi in the category n (i.e.. determined according to the equation (1 3a)). If the phoneme duration di 
obtained in step S212 is smaller than a threshold value eai set for the phoneme oi, the threshold value ed is set to di 
(steps S213, S214, and equation (13b)). 

[0073] The calculation of the phoneme duration di in steps S212 to S214 is perfornrwd for all the phoneme strings 
subiect to processing. More specffically. the variable i is inaemented in step S215, and steps S212 to S214 are 
repeated as tong asthe variable i is smaller than I in step S216. 

[D0741 The foregoing steps S21 1 to S216 correspond to step S6 in Fig. 3. In the abovedescrtoed manner, the pho- 
neme duration of all the phonenrte strings for attaining the production time T is obtained with respect to the expiratory 
paragraph subject to processing. 

[00751 Note that the construction of each of the above embodiments merely shows an embodiment of the present 
invention. Thus, various modifications are possfole. An example cf rrxxiif ications includes the followings. 

(1) In each of the above embodiments, the set of phonemes n is merely an example, thus a set of other elements 
may be used. Elements of a set of phonemes may be determined based on the type of language and phonemes. 
Also, the present invention is applicable to a language other tfian Japanese. 

(2) In each of the above embodiments, the expiratory paragraph is an example of the phoneme duration setting 
section. Thus, a word, a morpheme, a clause, a sentence or ifie like may be set as a pfioneme duration setting sec- 
tion. Note tfiat if a sentence is set as tfie phoneme duration setting section, it is necessary to consider pause 
between phonemes. 

(3) In each of the above enrtxxJinr)ents, a phonenie duration of natural speech nri^ 

phoneme duration. Alternatively, a value determined by other phoneme duration control rules or a value estimated 
by Categorical Multiple Regression may be used. 

(4) In fhB above secorvi embodiment, the category corresponcfing to speech production speed, which is used to 
obtain an average ^alue of the phoneme duration, is merely an exarrple, and other categories may be used. 

(5) In the above second enrtxxfinient the factors for Categorical Mutt^ 
an example, thus other factors and categories may be used. 

(6) In each of the above embodiments, the coeff ictent r^, = 3 which is multiplied to the standard deviation used for 
setting the phoneme duration initial value is merely an example, tftus another value may be set 

[0076] Fiffther, the ot^ect of the present invention can also be achieved by providing a storaglB medium, storing soft- 
ware program codes achieving the abov&<lescribed functions of the presem enri^ 

apparatus, reading theprogramoodesby aconnputer (e.g., CPU or MPU) of the system or the apparatus from the stor- 
age medium, then executing the program. 

[0077] In this case, the program codes read from tfie storage medium realize tfie functions according to the atXTve- 
described embodiments, and the storage medium storing the program codes constitutes the present invention. 
[0078] Astorage medium, such as a floppy disK a hard disK an optical disK a magnet&optical disk, CD-ROM, CD- 
R. a magnetic tape, a non-volatile type n^emory card, and ROM can be used for providing the program codes. 
[0079] Furthermore, besides aforesaid fictions according to the above ennbodiments are reafized by executing the 
program codes which are read by a computer, the present invention includes a case where an OS (operating system) 
or the Bke woriQng on the conrputer perfornre a part or the entire processes in accorc^ 
gram codes and realizes functions according to the above errtoiiments. 
[0080] Furthernwe, the present inventfon also includes a case where, after the pr^ 

medium are written In a function expansfon card whfoh is inserted into the computer or in a memory provided in a func- 
tion expansfon unit which is connected to the computer, CPU or the like contained in the function expansfon card or unit 
performs a part or the entire process in accordance with designatfons of the program codes and realizes functions of 
the above errtxxfinDents. 

[0031] Further, the program codes can be obtained in electronic form for example by downfoading the code over a 
nelwKxk such asthe internet Thus in accordance with another aspect off the present invention there is provided an elec- 
trical signal carrying processor implementable instructions for controlling a processor to carry out the mettKxJ as here- 
inbefore descn'bed. 

[0082] As has been set forth above, according to the present invention, a phoneme duration of a phoneme string can 
be set so as to achieve a spec^ speech production time. Thus, it is possible to realize natural phoneme duration 
regardless of the length of the speech production tima 
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[0083] As many apparently widely different embodiments of the present invention can be made without departing from 
the spirit and scope thereof, it is to be understood that the invention Is not limited to the specific embodiments thereof 
except as defined in the daims. 

Claims 

1 . A speech synthesizing apparatus for performing speech synthesis accorcfing to an inputted phoneme string, com- 
prising: 

storage means for storing statistical data related to a phonenne duration of each phoneme; 
determining means fa determining speech production time of a phonenrfe string in a predetermined section; 
setting means for setting a phoneme duration corresponding to the speech production time of each phoneme 
constructing the phoneme string, based on the statistical data of each phoneme obtained from said stor^e 
means; and 

generating means for generating a speech waveform by connecting phonemes using the phoneme duration. 

2. The speech synthesizing apparatus according to daim 1 , wherein the statistical data stored in said storage means 
indudes an average value, a standard deviation, and a minimum value of the phoneme duration of each phoneme. 

3. The speech synthesizing apparatus according to claim 1 , wherein said setting means sets ttie phoneme duration 
of each phoneme such that a total phoneme duration of phonemes constructing the phoneme string in the prede- 
termined section is dose to the speech production time determined by said determining means. 

4. The speech synthesizing apparatus according to daim 1 , wherein said setting means indudes: 

first setting means for setting an initial duration within a predetermined time range determined based on the 
statistical data stored in said storage means, with respect to each phoneme constructing the phoneme string 
in ttie predetermined section; and 

second siting means for setting a phoneme duration of each phoneme based on ttie initial duration and ttie 
statistical data so ttiat a total phoneme duration of phonemes constructing the phoneme string is dose to the 
speech production tima 

5. The speech synthesizing apparatus according to daim 4, wherein ttie statistical data stored in said storage means 
indudes an average value, a standard deviation, and a minimum value of the phoneme duration of each phoneme, 
and 

said first setting means sets ttie initial duration to fall wittiin the predetermined time range detenraned based 
on the average value, the standard deviation, and the minimum value of the phoneme duration, with respect to 
each phoneme. 

6. The speech synttiesizing apparatus according to daim 4, wherein said first setting means allocates an average 
time, corresponding to speech products speed obtained by dividing ttie speech production time by a number of 
phonemes constructing the phoneme string, to each phoneme, and 

if ttie obtained average time falls wittiin ttie predetermined time range, ttie average time is set as the initial 
duration of each phoneme, while if the obtained average time exceeds the predetermined time range, the initial 
duration of each phoneme is set to fall wittiin ttie predetermined time range. 

7. The speech synttiesizing apparatus acoorcGng to daim 5, wherein said second setting means sets the phoneme 
duration of each phoneme based on the initial duration, the speech production time, and ttie standard deviation 
stored in said storage means. 

8. The speech synthesizing apparatus according to daim 7, wherein said second setting means employs, as a coef- 
f idem, a value obtained by subtracting a total initial duration corresponding to each phoneme from ttie speech pro- 
duction time and dividing the siA>tracted value by a sum of squares of the standard deviation conrespondin^ 
phoneme, and sets as ttie ptoneme duration, a value obtained by adding a product of ttie coeffident and a square 
of ttie standard deviation of ttie phoneme to ttie initial duration of the phonema 
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9. The speech synthesizing apparatus accorcGng to daim 4, further conrprising a first initial value setting means for 
obtaining an estimated duration with respect to each phoneme by a multiple regression analysis, wherein 

rf the estimated duration falls within the predetermined time range, the estimated duration is set as the initial 
duration, while if the estimated duration exceeds the predetermined time range, the initial duration is set to fall 
within the predetermined time range, and 

said first setting means sets the phoneme duration initial value by executing said first initial value setting 
means. 

10. The speech synthesizing apparatus according to daim 9, wherein the statistical data stored in said storage means 
includes an average value, a standard deviatioa and a minimum value of the phoneme duration of each phoneme. 

said speech synthesizing apparatus further conprising a second initial value setting means for allocating an 
average time, obtained by dividing the speech production time by a number of phonemes constructing the pho- 
neme string, to each phoneme, and setting the average time as the initial duration of each phoneme if the 
obtained average time falls within the predetermined time range, while setting the initial duration of each pho- 
neme to fall within the predetermined time range if tiie obtained average time exceeds the predetermined time 
range, and 

said first setting means selectively utilizes the first initial value s^ng means or the second initial value setting 
means in accordance with a type of phoneme. 

11. The speech synthesizing apparatus according to daim 9, wherein said storage means stores 

to a phoneme duration of each phoneme for each category based on a speech production speed, and 

said setting means determines a category of speech production speed based on the speech production time 
and the phoneme string in the predetermined section, and sets the phoneme duration of each phoneme based 
on statistical data belonging to the determined category. 

1 2. A speech synthesizing method of performing speech synthesis according to an inputted phoneme string, compris- 
ing the steps of: 

detenmtning speech production time of a phoneme string in a predetermined section; 
setting a phoneme duration corresponding to the speech production time of each phoneme constructing ttie 
phoneme string, based on statistical data of each phoneme obtained from a storage unit storing statistical data 
related to a phoneme duration of each phoneme: and 

generating a speech waveform by connecting phonemes using the phoneme duration. 

13. The speech synthesizing m^hod according to daim 12, wherein the statistical data stored in said storage unit 
includes an average value, a standard deviation, and a rranirnum value of the phoner^ 

14. The speech synthesizing method according to daim 12. wherein in said setting step, the phoneme duration of each 
phoneme is set such that a total phoneme duration of phonemes constructing the phoneme string in the predeter- 
mined section is dose to the speech production tnrnedetennined in said detern^^ 

15. The speech synthesizing m^hod according to daim 12. wherein said setting step includes: 

a first setting step of setting an initial duration within a predetermined time range determined teased on the sta- 
tistical data stored in said stomge unit, witti respect to each phoneme constructing the phoneme string in the 
predetermined section; and 

a second setting step of setting a phoneme duration of each phoneme based on the initial duration and the sta- 
tistical data so that a total phonenDe duration of phonemes constructing the phoneme string is dose to the 
speech production tima 

16. The speech synthesizing method according to daim 15. wherein the statistical data stored in said storage unit 
indudes an average value, a standard deviation, and a minimum value of the phoneme duration of each phoneme, 
and 

in said first setting step, the initial duration is set to fall within the predetermined time range determined based 
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on the average value, the ^ndard deviation, and the minimum value of the phoneme duration, with respect to 
eachphonema 

17. The speech synthesizing method according to daim 15, wherein in said first setting step^ an average time, corre- 
sponding to speech production speed obtained by dividing the speech production time by a number of phonemes 
constructing the phoneme strirtg, is allocated to each phoneme, and 

if the obtained average time falls within the predetermined time range, the average time is set as the initial 
duration of each phoneme, while if the obtained average time exceeds the predetermined time range, the initial 
duration of each phoneme is set to faU within the predetermined time range. 

18- The speech synthesizing method accordng to daim 16. wherein in said second setting slept the phoneme duration 
of each phoneme is set based on the initial duration, tiie speech production time, and the standard deviation stored 
in said storage unit. 

1 9. The speech synthesizing method according to daim 1 8. wherein said second setting step employs, as a coeffidenl, 
a value obtained by subtracting a total initial duration corresponding to each phoneme from the speech production 
time and dividing the subtracted value by a sum of squares of the standard detnation con^esponding to each pho- 
neme, and a value obtained by adding a product of the coeff ident and a square of the standard deviation of the 
phoneme to the initial duration of the phoneme, is set as the phoneme duration. 

20- The speech synthesizing method according to daim 1 5. further comprising a first initial value setting step of obtain- 
ing an estimated duration with respect to each phoneme by a muttiple regression analysis, wherein 

if the estimated duration lalte within the predetermined time range, the estimated duration is set as the initial 
duration, while if the estimated duration exceeds the predetermined time range, tiie initial duration is set to fall 
within the predetermined time range, and 

in said first setting step, the phoneme duration initial value is set by executing said first initial value setting step. 

21. The speech synthesizing nnethod according to daim 20, wherein the statistical data stored in said storage unit 
indudes an average value; a starxiard deviatioa and a minimum value of the phoneme duratbn of each phoneme, 

said speech synthesizing metfiod further corrprising a second initial value setting Step of allocating an average 
time, obtained by dividing tiie speech production time by a number of phonemes constructing the phoneme 
string, to each phoneme, and setting the average time as the initial duration of each phoneme if the obtained 
average time falls within the predetermined time range, while setting the initial duration of each phoneme to fall 
within the predetemiined time range if the obtained average time exceeds the predetenmined time range, and 
in said first setting step, the first initial value setting step or the second initial value setting step is selectively 
utilized in accordance with a type of phoneme. 

22. The speech synttiesizing method aocordng to daim 20, wherein said storage unit stores statistical data related to 
a phoneme duration of each phor>eme for each category l>ased on a speech production speed, and 

in said setting stef^ a category of speech production speed is detenrnined based on the spee^ 

and the phoneme string in the predetenntned section, and the phoneme duration of each phoneme is set 

based on statistical data belonging to the d^ermined category. 

23. A storage mecfium storing a control program for having a computer realize a speech synthesizing process of per- 
forming speech synthesis according to an inputted phoneme string, said contrd program comprising: 

codes for a step of determining speech production time of a phoneme string in a predetermined section; 
codes for a step of setting a phoneme duration corresponding to tfie speech production time of each phoneme 
constructing the phonenrie string, based on statistical data of each phonenrie obtained^ unit stor- 

ing statistical data related to a phoneme duration of each phoneme; and 

codes for a step of generating a speech waveform by connecting phonemes using the phonenne duration. 

24. Arnethodofdeterrninir>gtheduFationofphoriernesof aphonernestringinarnetM 

ing allocating individual duration of phonemes based on weights determined in accordance with stored statistical 
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data for respective phonemes. 

25. A method of configuring a speech synthesis appsiratus comprising the steps of deriving statistical data for the dura- 
tion of phonemes to be used in speech synthesis and storing the statisticai data in said apparatus on a riat?^hase 
which is accessible for use in determinir^ phoneme duration t>ased on said statistical data when generating a 
speech waveform for an input phoneme string. 

26. An electrical signal carrying processor implementat)le instructions for controlling a processor to carry out the 
metfxxi of any one of claims 12 to 22 and 24 to 25. 
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